9 research outputs found

    How to improve TTS systems for emotional expressivity

    Get PDF
    Several experiments have been carried out that revealed weaknesses of the current Text-To-Speech (TTS) systems in their emotional expressivity. Although some TTS systems allow XML-based representations of prosodic and/or phonetic variables, few publications considered, as a pre-processing stage, the use of intelligent text processing to detect affective information that can be used to tailor the parameters needed for emotional expressivity. This paper describes a technique for an automatic prosodic parameterization based on affective clues. This technique recognizes the affective information conveyed in a text and, accordingly to its emotional connotation, assigns appropriate pitch accents and other prosodic parameters by XML-tagging. This pre-processing assists the TTS system to generate synthesized speech that contains emotional clues. The experimental results are encouraging and suggest the possibility of suitable emotional expressivity in speech synthesis

    An adaptive Speech Denoising System based on ICA with Voice Activity Detection

    No full text
    This contribution presents an innovative system for adaptive speech denoising using Independent Component Analysis (ICA) and Voice Activity Detection (VAD) in low dB-SNR environments. The implemented experiments consider instantaneous mixtures (two sources and two microphones) where the proposed system identifies the noise contained in each noisy mixture, applies the most suitable block ICA method among 3 methods (FastICA, Kernel ICA and JADE) and, after source separation, automatically identifies the estimated speech signal. The ICA suitability is in accordance with the detected noise, the signal mixtures are non-linear and the proposed system extracts information that can be used for further pre and/or postprocessing and for improving the block ICA’s output. The process is completely automatic from the source recording to its output and such system has a wide range of applications and significant potential over the conventional approaches.報告番号: ; 学位授与年月日: 2008-03-24 ; 学位の種別: 修士 ; 学位の種類: 修士(情報理工学) ; 学位記番号: ; 研究科・専攻: 情報理工学系研究科電子情報学専

    独立成分分析と音声区間検出による雑音除去

    No full text
    This contribution presents an innovative system for adaptive speech denoising using Independent Component Analysis (ICA) and Voice Activity Detection (VAD) in low dB-SNR environments. The implemented experiments consider instantaneous mixtures (two sources and two microphones) where the proposed system identifies the noise contained in each noisy mixture, applies the most suitable block ICA method among 3 methods (FastICA, Kernel ICA and JADE) and, after source separation, automatically identifies the estimated speech signal. The ICA suitability is in accordance with the detected noise, the signal mixtures are non-linear and the proposed system extracts information that can be used for further pre and/or postprocessing and for improving the block ICA’s output. The process is completely automatic from the source recording to its output and such system has a wide range of applications and significant potential over the conventional approaches

    Improving TTS synthesis for emotional expressivity by a prosodic parameterization of affect based on linguistic analysis

    No full text
    Several experiments have been carried out that revealed weaknesses of the current Text-To-Speech (TTS) systems in their emotional expressivity. Although some TTS systems allow XML-based representations of prosodic and/or phonetic variables, few publications considered, as a pre-processing stage, the use of intelligent text processing to detect affective information that can be used to tailor the parameters needed for emotional expressivity. This paper describes a technique for an automatic prosodic parameterization based on affective clues. This technique recognizes the affective information conveyed in a text and, accordingly to its emotional connotation, assigns appropriate pitch accents and other prosodic parameters by XML-tagging. This pre-processing assists the TTS system to generate synthesized speech that contains emotional clues. The experimental results are encouraging and suggest the possibility of suitable emotional expressivity in speech synthesis

    Emotional speech synthesis by sensing affective information from text

    No full text
    Speech can express subjective meanings and intents that, in order to be fully understood, rely heavily in its affective perception. Some Text-to-Speech (TTS) systems reveal weaknesses in their emotional expressivity but this situation can be improved by a better parametrization of the acoustic and prosodic parameters. This paper describes an approach for better emotional expressivity in a speech synthesizer. Our technique uses several linguistic resources that can recognize emotions in a text and assigns appropriate parameters to the synthesizer to carry out a suitable speech synthesis. For evaluation purposes we considered the MARY TTS system to readout ”happy” and ”sad” news. The preliminary perceptual test results are encouraging and human judges, by listening to the synthesized speech obtained with our approach, could perceive ”happy” emotions much better than compared to when they listened non- affective synthesized speech

    An automatic approach to virtual living based on environmental sound cues

    No full text
    This paper presents a novel indoor and outdoor monitoring system based on sound cues that can be used for the automatic creation of a Life-Log, health care monitoring and/or ambient communication with virtual worlds. Basically, the system detects daily life activities (e.g., laughing, talking, traveling, cooking, sleeping, etc.) and situational references (e.g., inside a train, at a park, at home, at school, etc.) by processing environmental sounds, creates a Life-Log and recreates those activities into a virtual-world. It is easily extensible, portable, feasible to implement and reveals advantages and originality compared with other life-sensing systems. The results of the perceptual tests are encouraging and the system performed satisfactorily in a noisy environment, attracting the attention and curiosity of the subjects
    corecore